Vanishing gradient problem
The tendency for the gradients of early hidden layers of some DNNs to become surprisingly flat (low). Increasingly lower gradients result in increasingly smaller changes to the weights on neurons in a DNN, leading to little or no learning. Models suffering from the vanishing gradient problem become difficult or impossible to train. LSTM cells address this issue.1